Normalization and Transformation Techniques for Robust Speaker Recognition

نویسندگان

  • Dalei Wu
  • Baojie Li
  • Hui Jiang
چکیده

Recognizing a person’s identity by voice is one of intrinsic capabilities for human beings. Automatic speaker recognition (SR) is a computational task for computers to perform a similar task, i.e., to recognize human identity based on voice characteristics. By taking a voice signal as input, automatic speaker recognition systems extract distinctive information from the input, usually using signal processing techniques, and then recognize a speaker’s identity based on the extracted information by comparing it with the knowledge previously learned at a training stage. The extracted distinctive information is encoded in a sequence of feature vectors, which is referred to as frame sequence. In terms of purposes of applications, SR tasks can be classified into two categories: speaker identification and speaker verification. Speaker identification (SI) is an application to recognize a speaker’s identity from a given group of enrolled speakers. If a speaker is assumed to be always in the enrolled speaker group, it is referred to as the closed set speaker identification; Otherwise, it is referred to as the open set speaker identification. On the other hand, speaker verification (SV) is an application to verify a speaker identity by simply making a binary decision, i.e., answering an identity question by either yes or no. SV is one of biometric authentication techniques, along with others, such as fingerprint (Jain et al., 2000) or iris authentication (Daugman, 2004). In the past decades, a variety of techniques for modeling and decision-making have been proposed to speaker recognition and proved to work effectively to some extent. In this chapter, we shall not delve too much into the survey for these techniques, but rather focus on normalization and transformation techniques for robust speaker recognition. For a tutorial of the conventional modeling and recognizing techniques, the reader can refer to (Campbell, 1999; Reynolds, 2002; Bimbot et al., 2004). Here, we just make it explicit that among many techniques the most successful ones are Gaussian mixture model (GMM) and hidden Markov model (HMM). With GMM/HMM, high performance can be achieved in sound working conditions, such as in a quiet environment, and for broadband speech. However, these techniques run into problems in realistic applications, since many realistic applications can not always satisfy the requirements of clean and quiet environments. Instead, the working environments are more adverse, noisy and sometimes in narrow-band width, for instance, telephony speech. Most SR systems degrade their performance substantially in adverse conditions. To deal with the difficulties, robust speaker recognition is such a topic for study. O pe n A cc es s D at ab as e w w w .in te ch w eb .o rg

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

Robustness in ASR: An Experimental Study of the Interrelationship between Discriminant Feature-Space Transformation, Speaker Normalization and Environment Compensation

This thesis addresses the general problem of maintaining robust automatic speech recognition (ASR) performance under diverse speaker populations, channel conditions, and acoustic environments. To this end, the thesis analyzes the interactions between environment compensation techniques, frequency warping based speaker normalization, and discriminant feature-space transformation (DFT). These int...

متن کامل

Efficient Speaker and Noise Normalization for Robust Speech Recognition

In this paper, we describe a computationally efficient approach for combining speaker and noise normalization techniques. In particular, we combine the simple yet effective Histogram Equalization (HEQ) for noise compensation with Vocal-tract length normalization (VTLN) for speaker-normalization. While it is intuitive to remove noise first and then perform VTLN, this is difficult since HEQ perfo...

متن کامل

An investigation of likelihood normalization for robust ASR

Noise-robust automatic speech recognition (ASR) systems rely on feature and/or model compensation. Existing compensation techniques typically operate on the features or on the parameters of the acoustic models themselves. By contrast, a number of normalization techniques have been defined in the field of speaker verification that operate on the resulting log-likelihood scores. In this paper, we...

متن کامل

Robust Speaker Recognition Biometric System a Detailed Review

his paper reviews Biometric based Speaker Recognition and presents brief about various algorithms and techniques used at various stages of Speaker Recognition and development of Attendance System as application of Speaker Recognition. The research is being carried out in this area for many years. However, the accuracy of system depends upon speaker’s variability and environmental conditions. Va...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012